Yinglin Xia and Jun Sun

Bioinformatic and Statistical Analysis of Microbiome Data

From Raw Sequences to Advanced Modeling with QIIME 2 and R

Yinglin Xia
Department of Medicine, University of Illinois Chicago, Chicago, IL, USA
Jun Sun
Department of Medicine, University of Illinois Chicago, Chicago, IL, USA
ISBN 978-3-031-21390-8 e-ISBN 978-3-031-21391-5
© Springer Nature Switzerland AG 2023
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG

The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

To our late grandmothers:

Mrs. Zhong Yilan (钟宜兰), Mrs. Sun Zhangqiong (孙章琼), and Mrs. Zhong Yizhen (钟宜珍) for their deep, constant love and support.

Preface

Over the past two decades, microbiome research has received much attention across diverse fields and has become a topic of great scientific and public interests.

The microbiome is invisible but exists within a body space or a particular environment, but it is essential for development, immunity, and nutrition and can change our health status and progression of multiple diseases. The human microbiome has been described as an “essential organ” of the human body, “the invisible organ,” “the forgotten organ” and “the last human organ” under active research, which highlights the importance of the human microbiome in health.

Compared with other research fields, microbiome data are complicated and have several unique characteristics. Thus, choosing appropriate statistical test or method is a very important step for analysis of microbiome data. However, it is still a challenging task for those biomedical researchers without statistical background and for those biostatisticians who do not have research experiences in this field. An appropriate statistical test or method is chosen not only based on the assumptions and properties of the statistical methods but also based on sufficient knowledge of the type and unique characteristics of collected data and the objective of the study.

We have done pioneering work to establish the microbiome data statistics and analysis. In October 2018, we published Statistical Analysis of Microbiome Data with R (Springer Nature), where we described a framework of statistical analysis of microbiome data. As the first statistical analysis book in microbiome research, we have received positive feedback from the readers around the world and positive review by Biometrical Journal (Dr. Kim-Anh Lê Cao, Biometrical Journal, Vol. 61, 2019). In that book, we focused on statistical analysis of microbiome data and introduced bioinformatic analysis of microbiome data with one chapter. Bioinformatic analysis of microbiome data is also a very important topic in microbiome research. The quality of read counts generated by bioinformatic analysis will have a large impact on the quality of downstream statistical analysis of microbiome data. We believe that it is very important to describe the workflows from bioinformatic analysis to statistical analysis of microbiome data. Thus, since October 2018, right after publishing our first book on statistical analysis of microbiome data, we planned to write a new book for combined bioinformatic and statistical analysis of microbiome data. This idea was motivated by the readers of the 2018 book who asked us for providing more coverages of bioinformatic analysis.

Our current book was written under the framework to provide one of the most important workflows for microbiome data analysis: from 16S rRNA sequencing raw reads to statistical analysis. We aim to provide a comprehensive review on statistical theories and methods on microbiome data analysis and discuss the development of bioinformatic and biostatistical methods and models in microbiome research. Particularly, we aim to provide the step-by-step procedures to perform bioinformatic and statistical analysis of microbiome data.

Profiling of bacterial communities through bacterial 16S rRNA sequencing is one important approach for bioinformatic analysis of microbiome data, which this book focuses on. Another important approach for bioinformatic analysis of microbiome data is shotgun metagenomics. Shotgun metagenomics methods allow for direct whole-genome shotgun sequencing of the microbiome metagenomic DNA, which have significantly improved our understanding of microbial community composition in ecosystem (e.g., the human body).

The microbiome data are the collection of all microorganisms. Emerging evidence and needs have shown the importance of viruses, fungi, and other microbes. Although no standard approach as bacterial 16S rRNA sequencing is available for analysis of the viral community and profiling of viruses’ communities is still very challenging, however, in current years, several viral metagenomic data analysis tools have been developed to characterize different features of viruses, including viral discovery, virome composition analysis, taxonomy classification, and functional annotation. Although the beginning interests started from the 16S rRNA sequencing approach, we hope to have opportunities to discuss the other microbial data analysis and particularly profiling and analysis of viruses’ communities in the future.

Statistical tools for performing microbiome data analysis are now available in different languages and environments across different platforms, either in web-based or programming-based approaches. Obviously, R system and environment play a critical role in developing statistical methods and models for analyzing microbiome data. QIIME 2 is a bioinformatic analysis tool via wrapping other sequencing platforms and also provides basic statistical analysis. Because of its comprehensive features and documentation supporting, QIIME 2 is one of the most popular bioinformatic tools in analysis of microbiome data. Thus, in this book, we leverage the capabilities of R and QIIME 2 for bioinformatic and statistical analysis of microbiome data.

Our book with 18 chapters is organized in this way: in the beginning two chapters, we specially provide overview and introduction of QIIME 2 and R in analysis of microbiome data, respectively. Chapters 3 to 6 present bioinformatic analysis of microbiome data and mainly through QIIME 2.

Chapter 7 introduces the original Operational Taxonomic Unit (OTU) methods in numerical taxonomy and Chapter 8 describes a movement of moving beyond OTU methods that has arisen in microbiome research field. Chapters 9 to 18 present biostatistical analysis of microbiome data and mainly through R and also QIIME 2.

Chapter 3 describes the basic data processing in QIIME 2. Chapter 4 introduces how to build feature table and feature data from raw sequencing reads. Chapter 5 introduces assigning taxonomy and building phylogenetic tree. Chapter 6 introduces taxonomic classification of the representative sequences and how to cluster Operational Taxonomic Units (OTUs). Chapter 7 comprehensively describes the development of OTU methods in numerical taxonomy, which provides a theoretical background of the clustering-based OTU methods that are used in bioinformatic analysis of microbiome data. Chapter 8 describes a movement that moves beyond OTU methods arisen in microbiome research field, which provides a comprehensive review on bioinformatic analysis of microbiome data. Chapters 9 and 10 provide two basic statistical analyses of microbiome data: Chap. 9 introduces alpha diversity metrics and visualization, Chap. 10 introduces beta diversity metrics and ordination. Chapters 11 to 18 present more advanced statistical methods and models in microbiome research. Chapter 11 introduces nonparametric methods for multivariate analysis of variance in ecological and microbiome data and statistical testing of beta diversity in microbiome data. Chapter 12 discusses differential abundance analysis of microbiome data mainly through the metagenomeSeq package. Chapter 13 presents zero-inflated beta models for microbiome data. Chapter 14 introduces compositional data and specifically the newly developed models for compositional analysis of microbiome data. Chapter 15 introduces linear mixed-effects models and describes using them for analysis of longitudinal microbiome data. Chapter 16 describes generalized linear mixed models (GLMMs) including the brief history of generalized linear models and generalized nonlinear models, algorithms for fitting GLMMs, as well as statistical hypothesis testing and modeling in GLMMs. Chapter 17 specifically introduces the newly developed GLMMs for longitudinal microbiome data and adopting the GLMMs in other fields to analyze longitudinal microbiome data. Chapter 18 provides an overview of multivariate longitudinal microbiome data analysis and specifically introduces the newly developed non-parametric microbial interdependence test. The large P small N problem is also discussed in the last chapter of this book.

We hope the contents and organization of these chapters will provide a set of basic concepts of microbiome, a framework of bioinformatic and statistical analysis of microbiome data. We expect this book to be used by (1) graduate students who study bioinformatic and statistical analysis of microbiome data; (2) bioinformaticians and statisticians, working on microbiome projects, either for their own research or for their collaborative research for experimental design, grant application, and data analysis; and (3) researchers who investigate biomedical and biochemical projects with the microbiome, and multi-omics data analysis. The datasets and R and QIIME 2 commends used in this book are available from Springer’s website or by requesting to the first author: Yinglin Xia at yinglin.​xia2007@gmail.​com.

Yinglin Xia
Jun Sun
Chicago, IL, USA Chicago, IL, USA
May 2022
Acknowledgments

The authors wish to thank the editors and staff at Springer Nature for their feedback along the way of process to publishing. Very special thanks to Mrs. Merry Stuber, Senior Editor, New York, for her enthusiasm in supporting and guiding the present project from beginning to end. We also wish to thank Dr. Cherry Ma, Managing Editor, Dr. Yu Zhu, Senior Publisher, and Mrs. Emily Zhang, Editor, for their help in processing the book proposal review.

We thank the three anonymous reviewers for their positive reviews on this book proposal. Especially we thank the two anonymous reviewers for their very positive and constructive reviews on the first draft of this book. Their constructive feedback was helpful in improving our reversion of this book. Broadly, we greatly appreciate the developers of bioinformatic and statistical methods, models, and R packages and R system and environment in general. Without their great works, the book cannot be available in current breadth and scope. Our special thanks go to Dr. Joseph Paulson for sharing his research paper with us.

Our path to the current academic journey and success was paved by supports from families for generations. We all wish to express our deepest appreciation to our respective parents and parents-in-law – Xincui Wang, Qijia Xia, Xiao-Yun Fu, and Zong-Xiang Sun – and to our respective families – Yuxuan Xia and Jason Xia – for their love and support.

Yinglin’s grandmother Zhong Yilan (钟宜兰) was very kind and generous. She was always willing to help others and encouraged him to help people who were in need. She took care of Yinglin’s everyday life until he left his village to attend high school at county. Yinglin remember that his grandmother Zhong Yilan always brought him with her to visit their relatives before Yinglin attended school. Yinglin’s grandmother from his mother’s side Zhong Yizhen (钟宜珍) was very kind and skillful. When Yinglin was young, every year his grandmother Zhong Yizhen gave them delicious homemade foods, delicate hand-knitted straw hats, and hand-woven straw fan as gifts. She only had one daughter, Yinglin’s mother. She lived with Yinglin’s parents for several years at the late time of her life.

Jun’s grandmother Sun Zhangqiong (孙章琼) was positive, warm-hearted, and open-minded. In a traditional society, women’s role was narrowly defined around the family. However, she always encouraged her granddaughter to chase her dreams, to find her intellectual potential, and to work for the society. Her vision and value for life has a deep influence on Jun.

We would like to dedicate this book to our late grandmothers, Mrs. Zhong Yilan (钟宜兰), Mrs. Sun Zhangqiong (孙章琼), and Mrs. Zhong Yizhen (钟宜珍) for their deep, constant love and support. May this book honor their memory and legacy.

Finally, we would like to acknowledge the VA Merit Award 1101BX004824-01, the DOD grant W81XWH-20-1-0623 (BC191198), Crohn’s & Colitis Foundation Senior Research Award (902766), the NIDDK/National Institutes of Health grant R01 DK105118, and R01DK114126 to Jun Sun. The study sponsors play no role in the study design, data collection, analysis, and interpretation of data. The contents do not represent the views of the United States Department of Veterans Affairs or the United States Government.

Contents
About the Authors
Yinglin Xia

is a Research Professor in the Department of Medicine at the University of Illinois Chicago (UIC). He was a Research Assistant Professor in the Department of Biostatistics and Computational Biology at the University of Rochester (Rochester, NY) before joining AbbVie (North Chicago, IL) as a Clinical Statistician. He joined UIC as a Research Associate Professor in 2015. Dr. Xia has successfully applied his statistical study design and data analysis skills to clinical trials, medical statistics, biomedical sciences, and social and behavioral sciences. He has published more than 140 statistical methodology and research papers in peer-reviewed journals. He serves on the editorial boards of several scientific journals including as an Associate Editor of Gut Microbes and has served as a reviewer for over 100 scientific journals. Dr. Xia has published three books on statistical analysis of microbiome and metabolomics data. He is the lead author of Statistical Analysis of Microbiome Data with R (Springer Nature, 2018), which was the first statistics book in microbiome study, Statistical Data Analysis of Microbiomes and Metabolomics (American Chemical Society, 2022), and An Integrated Analysis of Microbiomes and Metabolomics (American Chemical Society, 2022).A photo of Yinglin Xia.

 
Jun Sun

is a tenured Professor of Medicine at the University of Illinois Chicago. She is an elected fellow of the American Gastroenterological Association (AGA) and American Physiological Society (APS). She chairs the AGA Microbiome and Microbial Therapy section (2020–2022). She is an internationally recognized expert on microbiome and human diseases, such as vitamin D receptor in inflammation, dysbiosis, and intestinal dysfunction in amyotrophic lateral sclerosis (ALS). Her lab is the first to discover that chronic effects and molecular mechanisms of Salmonella infection and development of colon cancer. Dr. Sun has published over 210 scientific articles in peer-reviewed journals and 8 books on microbiome. She is on the editorial boards of more than 10 peer-reviewed international scientific journals and serves on the study sections for the national and international research foundations. Dr. Sun is a believer of scientific art and artistic science. She enjoys writing her science papers in English and poems in Chinese. Her poetry collection book《让时间停留在这一刻》(“Let Time Stay Still at This Moment”) was published in 2018.A photo of Jun Sun.